Stemming in Tamil for Affix Stripping
نویسندگان
چکیده
Stemming is the one of the most important step in many of the Natural Language processing tasks. Stemming reduces inflected words to a common stem/root word. Stemming process mainly carried out in English language because Tamil language is more complex in structure and more over it consists of critical grammatical rules. Tamil is a Dravidian language, mainly spoken by Tamil. Tamil words have more derivational forms than any other languages. The words get inflected to different forms based on number, person, gender and tense. Taking this complexity into account, the present attempt is to propose the effective stemmer for the Tamil language. The stemmer is used to find out the stem/root word for the Tamil language input. So the stemmer is built based on Affix stripping algorithm for the Tamil input. Then the stemmer reduces over-stemming and under-stemming errors. Affix stripping algorithm removes the prefix and/or suffix from the given Tamil input based on some rules. The input is taken as the Tamil text. Then the raw input is going for the Normalization, Feature extraction and Tokenization process. It removes the special characters, Stop words from the input and divides the input into tokens respectively. Then it fed into the stemmer for the stemming process. It finally displays the collection of stemmed words as an output and the performance, efficiency of the stemmer is evaluated.
منابع مشابه
An Improved Rule based Iterative Affix Stripping Stemmer for Tamil Language using K-Mean Clustering
Stemming is an important step in many of the Information Retrieval (IR) and Natural Language Processing (NLP) tasks. Stemming is usually done by removing any attached suffixes and prefixes (affixes) from index terms before the actual assignment of the term to the index. Stemming is a pre-processing step in Text Mining applications and basic requirement for many areas such as computational lingu...
متن کاملStemming Hausa text: using affix-stripping rules and reference look-up
Stemming is a process of reducing a derivational or inflectional word to its root or stem by stripping all its affixes. It is been used in applications such as information retrieval, machine translation, and text summarization, as their preprocessing step to increase efficiency. Currently, there are a few stemming algorithms which have been developed for languages such as English, Arabic, Turki...
متن کاملIdentifying Broken Plurals In Unvowelised Arabic Text
Irregular (so-called broken) plural identification in modern standard Arabic is a problematic issue for information retrieval (IR) and language engineering applications, but their effect on the performance of IR has never been examined. Broken plurals (BPs) are formed by altering the singular (as in English: tooth teeth) through an application of interdigitating patterns on stems, and singular ...
متن کاملA Light Weight Stemmer in Kokborok
Started from the very beginning, Stemming has been playing significant roles in several Natural Language Processing Applications such as information retrieval (IR), machine translation (MT), morph analysis and deciding the part of speech (POS). Several stemmers have been developed for a large number of languages including Indian languages; however no work has been done in Kokborok, a native lan...
متن کاملAn Affix Removal Stemmer for Natural Language
Stemming is the prerequisite step in Text Mining, Spelling Checker applications as well as a basic requirement for Natural Language Processing (NLP) tasks. Also it is very important in most of the Information Retrieval (IR) systems. This paper describes an affix stripping technique for finding out the stems from context free text in Nepali Language using lexical lookup based and rule based appr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016